12 research outputs found

    Performance en classification de données textuelles des passages aux urgences des modèles BERT pour le français

    Get PDF
    National audienceContextualized language models based on the Transformer architecture such as BERT (Bidirectional Encoder Representations from Transformers) have achieved remarkable performances in various language processing tasks. CamemBERT and FlauBERT are pre-trained versions for French.We used these two models to automatically classify free clinical notes from emergency department visits following a trauma. Their performances were compared to the TF-IDF (Term-Frequency - Inverse Document Frequency) method associated with the SVM (Support Vector Machine) classifier on 22481 clinical notes from the emergency department of the Bordeaux University Hospital. CamemBERT and FlauBERT obtained slightly better results than the TF-IDF/SVM couple for the micro F1-score. These encouraging results allow us to consider further developments in the use of transformers in the automation of emergency department data processing in order to consider the implementation of a national observatory of trauma in France.Les modèles de langue contextualisés basés sur l'architecture Transformer tels que BERT (Bidirectional Encoder Representations from Transformers) ont atteint des performances remarquables dans des diverses tâches de traitement de la langue. CamemBERT et FlauBERT en sont des versions pré-entraînées pour le français. Nous avons utilisé ces deux modèles afin de classer automatiquement des notes cliniques libres issues de visites aux urgences à la suite d'un traumatisme. Leurs performances ont été comparées à la méthode TF-IDF (Term-Frequency-Inverse Document Frequency) associé au classifieur SVM (Support Vector Machine) sur 22481 notes cliniques provenant du service des urgences du CHU de Bordeaux. CamemBERT et FlauBERT ont obtenu des résultats légèrement supérieurs à ceux du couple TF-IDF/SVM pour le micro F1-score. Ces résultats encourageants permettent d'envisager l'utilisation des transformers pour automatiser le traitement des données des urgences dans le cadre de la mise en place d'un observatoire national du traumatisme en France

    Development and Validation of Deep Learning Transformer Models for Building a Comprehensive and Real-time Trauma Observatory

    Get PDF
    BACKGROUND In order to study the feasibility of setting up a national trauma observatory in France, OBJECTIVE we compared the performance of several automatic language processing methods on a multi-class classification task of unstructured clinical notes. METHODS A total of 69,110 free-text clinical notes related to visits to the emergency departments of the University Hospital of Bordeaux, France, between 2012 and 2019 were manually annotated. Among those clinical notes 22,481 were traumas. We trained 4 transformer models (deep learning models that encompass attention mechanism) and compared them with the TF-IDF (Term- Frequency - Inverse Document Frequency) associated with SVM (Support Vector Machine) method. RESULTS The transformer models consistently performed better than TF-IDF/SVM. Among the transformers, the GPTanam model pre-trained with a French corpus with an additional auto-supervised learning step on 306,368 unlabeled clinical notes showed the best performance with a micro F1-score of 0.969. CONCLUSIONS The transformers proved efficient multi-class classification task on narrative and medical data. Further steps for improvement should focus on abbreviations expansion and multiple outputs multi-class classification

    Una herramienta de toma de decisiones para ajustar los niveles anormales en las pruebas de conteo sanguĂ­neo completo

    No full text
    International audienceEl Conteo Sanguíneo Completo (CBC) realizado con analizadores hematológicos automatizados es una de las pruebas de laboratorio más frecuentemente solicitadas. Se utiliza como instrumento de primera línea para el control de la salud, el diagnóstico y el seguimiento de los pacientes, el hemograma influye así en la mayoría de las decisiones médicas. Si el análisis no se ajusta a lo esperado, el personal del laboratorio revisa manualmente un frottis sanguíneo, lo que requiere tiempo. Los criterios de revisión de los hemogramas se basan en directrices de consenso internacional y se adaptan localmente para tener en cuenta los recursos del laboratorio y las características de la población. En este trabajo, nuestro objetivo consiste en proporcionar una herramienta de apoyo a las decisiones del laboratorio clínico para identificar qué variables del hemograma están relacionadas con un mayor riesgo de frottis manual anormal y en qué valores umbral. Así, tratamos el ajuste de criterios como un problema de selección de características (feature selection). Proponemos una regresión logística aditiva penalizada por Lasso, sensible a costes (cost-sensitive), en combinación con un criterio de selección de estabilidad (stability selection), todo ello con el fin de tener en cuenta las peculiaridades de los datos y el contexto: desequilibrio importante de clases, categorización de predictores continuos, necesidad de obtener resultados estables e interpretables. Nuestra propuesta es competitiva en términos de predicción (en comparación con redes neuronales profundas) y en términos de selección de modelos (siempre y cuando haya suficientes datos en la vecindad de los verdaderos valores umbrales). El paquete R CBCtools está disponible públicamente. Este trabajo se hizo en colaboración con Hélène Touchais, Inria Bordeaux, y Marcela Henríquez Henríquez, BUPA Chil

    Una herramienta de toma de decisiones para ajustar los niveles anormales en las pruebas de conteo sanguĂ­neo completo

    No full text
    International audienceEl Conteo Sanguíneo Completo (CBC) realizado con analizadores hematológicos automatizados es una de las pruebas de laboratorio más frecuentemente solicitadas. Se utiliza como instrumento de primera línea para el control de la salud, el diagnóstico y el seguimiento de los pacientes, el hemograma influye así en la mayoría de las decisiones médicas. Si el análisis no se ajusta a lo esperado, el personal del laboratorio revisa manualmente un frottis sanguíneo, lo que requiere tiempo. Los criterios de revisión de los hemogramas se basan en directrices de consenso internacional y se adaptan localmente para tener en cuenta los recursos del laboratorio y las características de la población. En este trabajo, nuestro objetivo consiste en proporcionar una herramienta de apoyo a las decisiones del laboratorio clínico para identificar qué variables del hemograma están relacionadas con un mayor riesgo de frottis manual anormal y en qué valores umbral. Así, tratamos el ajuste de criterios como un problema de selección de características (feature selection). Proponemos una regresión logística aditiva penalizada por Lasso, sensible a costes (cost-sensitive), en combinación con un criterio de selección de estabilidad (stability selection), todo ello con el fin de tener en cuenta las peculiaridades de los datos y el contexto: desequilibrio importante de clases, categorización de predictores continuos, necesidad de obtener resultados estables e interpretables. Nuestra propuesta es competitiva en términos de predicción (en comparación con redes neuronales profundas) y en términos de selección de modelos (siempre y cuando haya suficientes datos en la vecindad de los verdaderos valores umbrales). El paquete R CBCtools está disponible públicamente. Este trabajo se hizo en colaboración con Hélène Touchais, Inria Bordeaux, y Marcela Henríquez Henríquez, BUPA Chil

    A decision-making tool to fine-tune abnormal levels in the complete blood count tests

    Get PDF
    International audienceThe complete blood count (CBC) performed by automated hematology analyzers is one of the most ordered laboratory tests. It is a first-line tool for assessing a patient's general health status, or diagnosing and monitoring disease progression. When the analysis does not fit an expected setting, technologists manually review a blood smear using a microscope. The International Consensus Group for Hematology Review published in 2005 a set of criteria for reviewing CBCs. Commonly, adjustments are locally needed to account for laboratory resources and populations characteristics. Our objective is to provide a decision support tool to identify which CBC variables are associated with higher risks of abnormal smear and at which cutoff values. We propose a cost-sensitive Lasso-penalized additive logistic regression combined with stability selection. Using simulated and real CBC data, we demonstrate that our tool correctly identify the true cutoff values, provided that there is enough available data in their neighbourhood

    Performance en classification de données textuelles des passages aux urgences des modèles BERT pour le français

    Get PDF
    National audienceContextualized language models based on the Transformer architecture such as BERT (Bidirectional Encoder Representations from Transformers) have achieved remarkable performances in various language processing tasks. CamemBERT and FlauBERT are pre-trained versions for French.We used these two models to automatically classify free clinical notes from emergency department visits following a trauma. Their performances were compared to the TF-IDF (Term-Frequency - Inverse Document Frequency) method associated with the SVM (Support Vector Machine) classifier on 22481 clinical notes from the emergency department of the Bordeaux University Hospital. CamemBERT and FlauBERT obtained slightly better results than the TF-IDF/SVM couple for the micro F1-score. These encouraging results allow us to consider further developments in the use of transformers in the automation of emergency department data processing in order to consider the implementation of a national observatory of trauma in France.Les modèles de langue contextualisés basés sur l'architecture Transformer tels que BERT (Bidirectional Encoder Representations from Transformers) ont atteint des performances remarquables dans des diverses tâches de traitement de la langue. CamemBERT et FlauBERT en sont des versions pré-entraînées pour le français. Nous avons utilisé ces deux modèles afin de classer automatiquement des notes cliniques libres issues de visites aux urgences à la suite d'un traumatisme. Leurs performances ont été comparées à la méthode TF-IDF (Term-Frequency-Inverse Document Frequency) associé au classifieur SVM (Support Vector Machine) sur 22481 notes cliniques provenant du service des urgences du CHU de Bordeaux. CamemBERT et FlauBERT ont obtenu des résultats légèrement supérieurs à ceux du couple TF-IDF/SVM pour le micro F1-score. Ces résultats encourageants permettent d'envisager l'utilisation des transformers pour automatiser le traitement des données des urgences dans le cadre de la mise en place d'un observatoire national du traumatisme en France

    Traitement automatique des résumés de passages aux urgences : focus sur la désidentification

    Get PDF
    National audienceIn France, structured data on emergency room visits are aggregated at the national level to build a syndromic surveillance system for different health events. For visits motivated by a traumatic event, information on the circumstances is stored in free text clinical notes. Automating the processing of these notes should allow the enrichment of surveillance tools. In development at Inserm and the Emergency Department of the Bordeaux University Hospital, The TARPON (for Automatic Processing of Emergency Room Notes for a National Observatory, in French) project aims to meet this objective by using the latest deep learning tools applied to automatic language analysis. To exploit these data, an automatic de-identification system, guaranteeing the protection of personal data, is necessary. We present here a comparison study of models allowing the de-identification of clinical texts in French.En France, les données structurées concernant les visites aux urgences sont agrégées au niveau national pour construire un système de surveillance syndromique de différents événements de santé. Pour les visites motivées par un événement traumatique, les informations sur les circonstances sont stockées dans des notes cliniques en texte libre. Automatiser le traitement de ces notes devrait permettre l'enrichissement des outils de surveillance. En développement à l'Inserm et au Service des urgences du CHU de Bordeaux, le projet TARPON (Traitement Automatique des Résumés de Passages aux urgences pour un Observatoire National) vise à répondre à cet objectif par le biais des derniers outils d'apprentissage profond appliqués à l'analyse automatique du langage. Pour exploiter ces données, un système de désidentification automatique, garantissant la protection des données personnelles est nécessaire. Nous présentons ici une étude de comparaison de modèles permettant la désidentification des textes cliniques en français

    Deep Learning Transformer Models for Building a Comprehensive and Real-time Trauma Observatory: Development and Validation Study

    No full text
    International audienceBackground Public health surveillance relies on the collection of data, often in near-real time. Recent advances in natural language processing make it possible to envisage an automated system for extracting information from electronic health records. Objective To study the feasibility of setting up a national trauma observatory in France, we compared the performance of several automatic language processing methods in a multiclass classification task of unstructured clinical notes. Methods A total of 69,110 free-text clinical notes related to visits to the emergency departments of the University Hospital of Bordeaux, France, between 2012 and 2019 were manually annotated. Among these clinical notes, 32.5% (22,481/69,110) were traumas. We trained 4 transformer models (deep learning models that encompass attention mechanism) and compared them with the term frequency–inverse document frequency associated with the support vector machine method. Results The transformer models consistently performed better than the term frequency–inverse document frequency and a support vector machine. Among the transformers, the GPTanam model pretrained with a French corpus with an additional autosupervised learning step on 306,368 unlabeled clinical notes showed the best performance with a micro F1-score of 0.969. Conclusions The transformers proved efficient at the multiclass classification of narrative and medical data. Further steps for improvement should focus on the expansion of abbreviations and multioutput multiclass classification

    A physiologically based pharmacokinetic (PBPK) model exploring the blood-milk barrier in lactating species - A case study with oxytetracycline administered to dairy cows and goats

    No full text
    International audienceAntibiotic excretion into milk depends on several factors such as the compound's physicochemical properties, the animal physiology, and the milk composition. The objective of this study was to develop a physiologically based pharmacokinetic (PBPK) model describing the passage of drugs into the milk of lactating species. The udder is described as a permeability limited compartment, divided into vascular, extracellular water (EW), intracellular water (IW) and milk, which was stored in alveolar and cistern compartments. The pH and ionization in each compartment and the binding to IW components and to milk fat, casein, whey protein, calcium, and magnesium were considered. Bidirectional passive diffusion across the blood-milk barrier was implemented, based on in vitro permeability studies.The model application used to predict the distribution of oxytetracycline in cow and goat milk, after different doses and routes of administration, was successful. By integrating inter-individual variability and uncertainty, the model also allowed a suitable estimation of the withdrawal periods.Further work is in progress to evaluate the predictive ability of the PBPK model for compounds with different physico-chemical properties that are potentially actively transported in order to extrapolate the excretion of xenobiotics in milk of various animal species including humans
    corecore